Understand content, construct, and criterion-related validity in employee selection. Learn how to build defensible hiring systems backed by empirical evidence.
"Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores." — Samuel Messick, American Psychologist (1995)
What if your carefully designed selection process isn't actually measuring what you think it measures—and you wouldn't know until expensive hiring mistakes pile up? Validity represents the cornerstone of defensible, effective employee selection. It answers the fundamental question: Does your selection tool actually predict success in the job?
Modern validity frameworks distinguish between three primary types of evidence—content, construct, and criterion-related—each providing different assurance that your hiring decisions rest on solid ground. Understanding these distinctions and how to build evidence for each transforms hiring from intuition-based guesswork into a science grounded in empirical reality.
This matters not only for organizational effectiveness but also for legal compliance and ethical fairness. This article provides a comprehensive guide to validity evidence, empirical benchmarks, and practical implementation strategies.
Validity in employee selection refers to the extent to which a selection tool (test, interview, assessment, or combination of tools) actually measures what it is designed to measure and predicts job-related outcomes. It's not whether a test is reliable—a test can be highly reliable (consistently measuring something) while being completely invalid (measuring the wrong thing).
The relationship is straightforward: A valid selection tool enables hiring better-performing employees, reduces costly turnover, decreases training burden, and supports fair, defensible hiring decisions. Invalid selection tools waste resources on unqualified hires while potentially discriminating against protected groups.
Critical Example: A company uses conscientiousness personality assessments to predict sales performance. Research shows conscientiousness is a valid predictor of general job performance (ρ = .26 across jobs). However, the relationship varies dramatically between incumbent (current employee) samples and applicant samples. A meta-analysis across 35 studies found that conscientiousness-job performance correlations were significantly higher in concurrent validity designs with incumbents (ρ = .27) compared to predictive validity designs with applicants (ρ = .13)—a difference of more than half the effect size.
Modern validity frameworks (grounded in the Standards for Educational and Psychological Testing and Downing's validity framework) organize validity evidence into distinct categories, each addressing different aspects of whether a selection tool is defensible.
Content validity refers to the degree to which a selection tool adequately covers the knowledge, skills, abilities, and other characteristics (KSAOs) required by a specific job. A selection tool has strong content validity when it samples representative behaviors and competencies that employees actually perform in the job.
How Content Validity Is Established: Content validity evidence begins with systematic job analysis—a rigorous process of identifying what competencies matter for job success. This typically involves: (1) Subject Matter Expert (SME) consensus on required competencies; (2) Competency mapping showing which job tasks require which competencies; (3) Representative item development reflecting actual job scenarios; (4) Expert review and revision until item-competency alignment is confirmed.
Empirical Measures: Content Validity Index (CVI) is the most widely used quantitative measure. It's calculated as the proportion of SMEs rating items as "relevant" to the job. Item-level CVI (I-CVI) values < 0.7 typically need elimination or revision; > 0.79 are considered appropriate. Scale-level CVI (S-CVI) values > 0.80 indicate acceptable content validity.
Empirical Example: In a study of 356 IT professionals assessing a work-stress tool, the S-CVI was 0.829, indicating acceptable content validity. Modified Kappa adjusts CVI for chance agreement, providing a more conservative estimate. Kappa values of 0.6-0.8 indicate substantial agreement; 0.8-1.0 indicates almost perfect agreement.
Construct validity addresses whether a selection tool actually measures the underlying construct (competency, ability, trait) it claims to measure. A test can be content-valid (covering job-relevant content) while lacking construct validity (not accurately measuring the intended construct).
The "black box of selection" refers to a critical problem: A selection tool may predict job performance, but which constructs are responsible for the prediction may be unclear. A tool designed to measure leadership might actually be measuring confidence or extroversion instead. Understanding what constructs are responsible for prediction is essential for both effectiveness and fairness.
Evidence for Construct Validity: Internal Structure Evidence examines whether the test measures the constructs it claims to measure through factor analysis and item-construct mapping. Convergent validity shows the assessment correlates highly with other measures of the same construct. Divergent validity shows the assessment doesn't correlate with measures of different constructs.
Empirical Example from Medical School Selection (Schreurs et al., 2019): Researchers examined internal structure validity using cognitive diagnostic modeling on 547 applicants. Results revealed 89% blueprint overlap between intended and measured constructs. For the video-based situational judgment test (V-SJT): Collaboration showed 0.99 attribute-level accuracy; Reflection: 0.90; Empathy: 0.86; Ethical awareness: 0.87. These accuracies above 0.67 indicate that items adequately classified applicants into competency categories.
Criterion-related validity is perhaps the most important validity type for selection. It addresses whether test scores correlate with job-related criteria (performance, productivity, retention, customer satisfaction, etc.).
Predictive Validity: The most rigorous and practically meaningful evidence. Test scores are collected before hiring, then after employees work in the job (typically 6-12 months), their job performance is measured. The correlation between test scores and job performance measures indicates predictive validity. Advantages: High fidelity to real selection contexts. Challenges: Requires patience (waiting months for outcomes), large sample sizes, and complete performance data.
Concurrent Validity: Assesses current employees—administering a test to incumbents and correlating scores with their current performance. Advantages: Quick, requires smaller samples. Challenges: Incumbents are more homogeneous and higher-performing than applicants (restriction of range), potentially overestimating validity.
Research across selection tools reveals substantial variation in criterion-related validity:
Employment Interviews: A meta-analysis of 37 studies (N = 30,646 participants) examining interview validity for specific constructs found: Task performance prediction: ρ = .30 (95% CI: .21 to .38); Contextual performance prediction: ρ = .28 (95% CI: .19 to .36). Structured interviews with targeted scoring procedures showed higher validity than unstructured approaches.
Conscientiousness Assessments: The relationship varies critically by design: Incumbent samples (concurrent validity): ρ = .27; Applicant samples (predictive validity): ρ = .13. This 50% reduction in validity from incumbent to applicant samples reflects range restriction, potential faking, and personality changes.
Cognitive Ability Tests: Among the most studied selection tools: General cognitive ability: ρ = .28 to .51 (depending on job complexity; higher for complex jobs); Specific ability tests (quantitative, verbal, spatial): ρ = .15 to .40.
Practical Interpretation: ρ = .30 to .40: Practically meaningful, predicts job performance significantly better than chance. ρ = .45 to .55: Strong prediction, substantially improves hiring accuracy. ρ = .20 or below: Weak prediction, may not justify the cost of administration.
Most organizations use multiple assessment tools (applications, interviews, tests, work samples, background checks). Each tool provides partial prediction. The question becomes: How should they be combined?
Incremental Validity: Incremental validity asks whether adding a new tool improves prediction beyond existing tools. Research provides guidance: Interviews + cognitive tests: Interview adds approximately .05 to .10 to cognitive test validity; Cognitive tests + work samples: Work samples add approximately .10 to .15 to cognitive test validity; Multiple structured interviews: Second interviewer adds minimal incremental validity unless trained on different competencies.
Practical Approach: (1) Start with job analysis identifying critical competencies; (2) Select tools with documented validity evidence for your specific job; (3) Use structured approaches rather than unstructured; (4) Combine tools strategically (assess different competencies, not redundant areas); (5) Validate locally when possible; (6) Monitor outcomes comparing selection tool scores to actual employee performance data.
Validity transforms employee selection from subjective, biased guesswork into a systematic, defensible science. Understanding the three types of validity evidence—content (what the tool covers), construct (what it actually measures), and criterion-related (whether scores predict performance)—enables organizations to build hiring systems that are simultaneously more effective and more fair.
Organizations using selection tools without validity evidence risk expensive bad hires while potentially exposing themselves to discrimination claims. Organizations with strong validity evidence hire better performers, reduce turnover, lower training costs, and can defend their decisions if questioned.
Organization Learning Labs offers comprehensive job analysis, selection tool evaluation, validity evidence development, and outcome monitoring services designed to help organizations build hiring systems grounded in empirical evidence. Contact us at research@organizationlearninglabs.com.
AERA, APA & NCME. (2014). Standards for educational and psychological testing. American Educational Research Association.
Downing, S. M. (2003). Validity: On the meaningful interpretation of assessment data. Medical Education, 37(9), 830-837.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances. American Psychologist, 50(9), 741-749.
Schreurs, S., Cleutjens, K., Collares, C. F., Cleland, J., & Goude Egbrink, M. G. A. (2019). Opening the black box of selection. Advances in Health Sciences Education, 25(2), 363-382.
Watrin, L., Geissler, H., Diers, S., & Jia, Y. (2023). The criterion-related validity of conscientiousness in applicant vs. incumbent samples. International Journal of Selection and Assessment, 31(1), 45-63.
Comments